Importing the data

As a person of many talents, it’s time to take on a different job: nutrition analysis! Your goal is to analyze the sugar content of a sample of foods from around the world.

A large dataset called food.csv is ready for your use in the working directory. Instead of the usual read.csv(), however, you’re going to use the faster fread() from the data.table package. By default, the data will come in as a data table, but since you’re used to working with data frames, you can get fread() to return one by setting data.table = FALSE.

[Note: In order to make these exercises manageable, we’ve taken a random subset of the original data. The dataset you’ll be working with may not be large enough for fread() to make a huge difference, but be aware that there will be times when read.csv() just won’t cut it.]

# Load data.table
library(data.table)

# Import food.csv as a data frame: food
food <- fread("../xDatasets/food.csv", data.table = FALSE)
# View summary of food
sum_food <- as.data.frame(do.call(cbind, lapply(food, summary)))
## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 1)
sum_food %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689")
V1 code url creator created_t created_datetime last_modified_t last_modified_datetime product_name generic_name quantity packaging packaging_tags brands brands_tags categories categories_tags categories_en origins origins_tags manufacturing_places manufacturing_places_tags labels labels_tags labels_en emb_codes emb_codes_tags first_packaging_code_geo cities cities_tags purchase_places stores countries countries_tags countries_en ingredients_text allergens allergens_en traces traces_tags traces_en serving_size no_nutriments additives_n additives additives_tags additives_en ingredients_from_palm_oil_n ingredients_from_palm_oil ingredients_from_palm_oil_tags ingredients_that_may_be_from_palm_oil_n ingredients_that_may_be_from_palm_oil ingredients_that_may_be_from_palm_oil_tags nutrition_grade_uk nutrition_grade_fr pnns_groups_1 pnns_groups_2 states states_tags states_en main_category main_category_en image_url image_small_url energy_100g energy_from_fat_100g fat_100g saturated_fat_100g butyric_acid_100g caproic_acid_100g caprylic_acid_100g capric_acid_100g lauric_acid_100g myristic_acid_100g palmitic_acid_100g stearic_acid_100g arachidic_acid_100g behenic_acid_100g lignoceric_acid_100g cerotic_acid_100g montanic_acid_100g melissic_acid_100g monounsaturated_fat_100g polyunsaturated_fat_100g omega_3_fat_100g alpha_linolenic_acid_100g eicosapentaenoic_acid_100g docosahexaenoic_acid_100g omega_6_fat_100g linoleic_acid_100g arachidonic_acid_100g gamma_linolenic_acid_100g dihomo_gamma_linolenic_acid_100g omega_9_fat_100g oleic_acid_100g elaidic_acid_100g gondoic_acid_100g mead_acid_100g erucic_acid_100g nervonic_acid_100g trans_fat_100g cholesterol_100g carbohydrates_100g sugars_100g sucrose_100g glucose_100g fructose_100g lactose_100g maltose_100g maltodextrins_100g starch_100g polyols_100g fiber_100g proteins_100g casein_100g serum_proteins_100g nucleotides_100g salt_100g sodium_100g alcohol_100g vitamin_a_100g beta_carotene_100g vitamin_d_100g vitamin_e_100g vitamin_k_100g vitamin_c_100g vitamin_b1_100g vitamin_b2_100g vitamin_pp_100g vitamin_b6_100g vitamin_b9_100g vitamin_b12_100g biotin_100g pantothenic_acid_100g silica_100g bicarbonate_100g potassium_100g chloride_100g calcium_100g phosphorus_100g iron_100g magnesium_100g zinc_100g copper_100g manganese_100g fluoride_100g selenium_100g chromium_100g molybdenum_100g iodine_100g caffeine_100g taurine_100g ph_100g fruits_vegetables_nuts_100g collagen_meat_protein_ratio_100g cocoa_100g chlorophyl_100g carbon_footprint_100g nutrition_score_fr_100g nutrition_score_uk_100g
Min. 1 100030 1500 1500 1332073018 1500 1340209117 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 logical 1500 1500 1500 1500 1500 1500 1500 1500 logical 1500 1500 1500 1500 logical 0 1500 1500 1500 0 logical 1500 0 logical 1500 logical 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0 0 0 0 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 0 0.4 0.033 0.08 0.721 1.09 0.25 0.5 logical logical logical logical logical logical logical logical logical logical 0 0 0 0 logical logical 100 0 logical logical 0 8.6 0 0 1.1 logical logical 0 0 0 0 logical 7.5e-07 5e-04 5.3e-06 0 6e-05 0.000176 0.00059 6.6e-05 1.13e-05 2e-07 1.9e-06 9e-07 0.00082 0.00063 4e-05 3e-04 0 0.043 0 5e-05 5e-04 3.6e-05 6.5e-06 2.7e-06 1.44e-06 logical logical 1e-05 logical logical logical 2 12 30 logical 12 -12 -12
1st Qu. 375.75 124974.5 character character 1393744722 character 1424291534.75 character character character character character character character character character character character character character character character character character character character character character 1500 character character character character character character character character 1500 character character character character 1500 0 character character character 0 1500 character 0 1500 character 1500 character character character character character character character character character character 369.75 35.975 0.9 0.2 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 3.87 1.6525 1.3 0.0905 0.721 1.09 0.25 0.5165 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0 0 3.7925 1 1500 1500 100 0.25 1500 1500 9.45 59.1 0.5 1.5 1.1 1500 1500 0.04375 0.0172244094488189 0 0 1500 9.5e-07 0.002125 6.85e-06 0.002 0.0002925 0.00026 0.003325 0.00023 5e-05 4e-07 3.3e-06 0.000685 0.00082 0.067815 0.065 6e-04 0.045 0.19375 0.0012 0.067 9e-04 6.025e-05 6.5e-06 4.525e-06 1.44e-06 1500 1500 1e-05 1500 1500 1500 11.25 13.5 47 1500 97.425 1 0
Median 750.5 149514 character character 1424746734.5 character 1436867403 character character character character character character character character character character character character character character character character character character character character character logical character character character character character character character character logical character character character character logical 1 character character character 0 logical character 0 logical character logical character character character character character character character character character character 966.5 237 6 1.7 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 9.5 3.9 3 0.101 0.721 1.09 0.25 0.533 logical logical logical logical logical logical logical logical logical logical 0 0 13.5 4.05 logical logical 100 0.5 logical logical 39.5 67 1.75 6 1.1 logical logical 0.44979 0.177082677165355 5.5 7e-05 logical 3e-06 0.0044 8.4e-06 0.019 0.00045 0.00093 0.0069 8e-04 7.3e-05 2e-06 4.7e-06 0.00195 0.00082 0.135 0.194 9e-04 0.12 0.3185 0.0042 0.104 0.00167 8.45e-05 6.5e-06 6.35e-06 1.44e-06 logical logical 1e-05 logical logical logical 42 15 60 logical 182.85 7 6
Mean 750.5 149612.94 1500 1500 1413694024.41 1500 1430317795.218 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1.84584178498986 1500 1500 1500 0.0486815415821501 1500 1500 0.137931034482759 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1083.235895 668.407142857143 13.3945006313131 4.87399004267425 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 19.7731428571429 9.98555555555556 3.72588888888889 0.173666666666667 0.721 1.09 0.25 0.533 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0.0105263157894737 0.0264565217391304 27.9578118686869 12.6564831460674 1500 1500 100 2.93333333333333 1500 1500 30.7285714285714 56.0555555555556 2.82298913043478 7.56324050632911 1.1 1500 1500 1.12053058111111 0.440933823928259 10.0671641791045 0.000303926086956522 1500 1.29393333333333e-05 0.00689818181818182 8.4e-06 0.024971487804878 0.000605 0.00111858823529412 0.008555625 0.0112242105263158 0.000110858823529412 1.42272727272727e-06 4.7e-06 0.00267827857142857 0.00082 0.16921 0.328764615384615 0.0144 0.203958235294118 0.377666666666667 0.00454708108108108 0.106559523809524 0.00158142857142857 8.45e-05 6.5e-06 6.35e-06 1.44e-06 1500 1500 1e-05 1500 1500 1500 36.885 15.6666666666667 57 1500 131.183333333333 7.94074074074074 7.63111111111111
3rd Qu. 1125.25 174505.75 character character 1436494439 character 1445896711.75 character character character character character character character character character character character character character character character character character character character character character logical character character character character character character character character logical character character character character logical 3 character character character 0 logical character 0 logical character logical character character character character character character character character character character 1641.5 974 20 6.5 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 29 12.7 3.2 0.2205 0.721 1.09 0.25 0.5495 logical logical logical logical logical logical logical logical logical logical 0 0.002625 55 14.7 logical logical 100 4.4 logical logical 42.85 69.8 3.5 10.675 1.1 logical logical 1.1938 0.47 13 0.0005975 logical 5.5e-06 0.0097 9.95e-06 0.03 0.0009625 0.00127 0.01405 0.001235 0.00017 2.245e-06 6.1e-06 0.005075 0.00082 0.2535 0.367 0.02145 0.1985 0.434 0.00771 0.13 0.00225 0.00010875 6.5e-06 8.175e-06 1.44e-06 logical logical 1e-05 logical logical logical 52.25 17.5 70 logical 190.775 15 16
Max. 1500 199880 character character 1452552527 character 1452553072 character character character character character character character character character character character character character character character character character character character character character 1500 character character character character character character character character 1500 character character character character 1500 17 character character character 1 1500 character 4 1500 character 1500 character character character character character character character character character character 3700 2900 100 57 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 75 46.2 12.4 0.34 0.721 1.09 0.25 0.566 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0.1 0.43 100 100 1500 1500 100 8.3 1500 1500 71 70 46.7 61 1.1 1500 1500 102 40 50 0.001346 1500 1e-04 0.032 1.15e-05 0.217 0.0013 0.0066 0.016 0.2 0.000237 2.5e-06 7.5e-06 0.006 0.00082 0.372 1.43 0.042 1 1.155 0.0137 0.333 0.0026 0.000133 6.5e-06 1e-05 1.44e-06 1500 1500 1e-05 1500 1500 1500 80 20 81 1500 198.7 28 28
NA’s 1 100030 1500 1500 1332073018 1500 1340209117 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 logical 1500 1500 1500 1500 1500 1500 1500 1500 logical 1500 1500 1500 1500 logical 514 1500 1500 1500 514 logical 1500 514 logical 1500 logical 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 700 1486 708 797 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 1465 1464 1491 1497 1499 1499 1499 1498 logical logical logical logical logical logical logical logical logical logical 1481 1477 708 788 logical logical 1499 1497 logical logical 1493 1491 994 710 1499 logical logical 780 780 1433 1477 logical 1485 1478 1498 1459 1478 1483 1484 1481 1483 1489 1498 1486 1499 1497 1487 1497 1449 1488 1463 1479 1493 1498 1499 1498 1499 logical logical 1499 logical logical logical 1470 1497 1491 logical 1497 825 825
# View head of food
food %>% 
  head()  %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
  scroll_box(width = "100%", height = "300px")
V1 code url creator created_t created_datetime last_modified_t last_modified_datetime product_name generic_name quantity packaging packaging_tags brands brands_tags categories categories_tags categories_en origins origins_tags manufacturing_places manufacturing_places_tags labels labels_tags labels_en emb_codes emb_codes_tags first_packaging_code_geo cities cities_tags purchase_places stores countries countries_tags countries_en ingredients_text allergens allergens_en traces traces_tags traces_en serving_size no_nutriments additives_n additives additives_tags additives_en ingredients_from_palm_oil_n ingredients_from_palm_oil ingredients_from_palm_oil_tags ingredients_that_may_be_from_palm_oil_n ingredients_that_may_be_from_palm_oil ingredients_that_may_be_from_palm_oil_tags nutrition_grade_uk nutrition_grade_fr pnns_groups_1 pnns_groups_2 states states_tags states_en main_category main_category_en image_url image_small_url energy_100g energy_from_fat_100g fat_100g saturated_fat_100g butyric_acid_100g caproic_acid_100g caprylic_acid_100g capric_acid_100g lauric_acid_100g myristic_acid_100g palmitic_acid_100g stearic_acid_100g arachidic_acid_100g behenic_acid_100g lignoceric_acid_100g cerotic_acid_100g montanic_acid_100g melissic_acid_100g monounsaturated_fat_100g polyunsaturated_fat_100g omega_3_fat_100g alpha_linolenic_acid_100g eicosapentaenoic_acid_100g docosahexaenoic_acid_100g omega_6_fat_100g linoleic_acid_100g arachidonic_acid_100g gamma_linolenic_acid_100g dihomo_gamma_linolenic_acid_100g omega_9_fat_100g oleic_acid_100g elaidic_acid_100g gondoic_acid_100g mead_acid_100g erucic_acid_100g nervonic_acid_100g trans_fat_100g cholesterol_100g carbohydrates_100g sugars_100g sucrose_100g glucose_100g fructose_100g lactose_100g maltose_100g maltodextrins_100g starch_100g polyols_100g fiber_100g proteins_100g casein_100g serum_proteins_100g nucleotides_100g salt_100g sodium_100g alcohol_100g vitamin_a_100g beta_carotene_100g vitamin_d_100g vitamin_e_100g vitamin_k_100g vitamin_c_100g vitamin_b1_100g vitamin_b2_100g vitamin_pp_100g vitamin_b6_100g vitamin_b9_100g vitamin_b12_100g biotin_100g pantothenic_acid_100g silica_100g bicarbonate_100g potassium_100g chloride_100g calcium_100g phosphorus_100g iron_100g magnesium_100g zinc_100g copper_100g manganese_100g fluoride_100g selenium_100g chromium_100g molybdenum_100g iodine_100g caffeine_100g taurine_100g ph_100g fruits_vegetables_nuts_100g collagen_meat_protein_ratio_100g cocoa_100g chlorophyl_100g carbon_footprint_100g nutrition_score_fr_100g nutrition_score_uk_100g
1 100030 http://world-en.openfoodfacts.org/product/3222475745867/confiture-de-fraise-fraise-des-bois-au-sucre-de-canne-casino-delices sebleouf 1424747544 2015-02-24T03:12:24Z 1438445887 2015-08-01T16:18:07Z Confiture de fraise fraise des bois au sucre de canne 265 g Bocal,Verre bocal,verre Casino Délices casino-delices Aliments et boissons à base de végétaux,Aliments d’origine végétale,Aliments à base de fruits et de légumes,Petit-déjeuners,Produits à tartiner,Fruits et produits dérivés,Pâtes à tartiner végétaux,Produits à tartiner sucrés,Confitures et marmelades,Confitures,Confitures de fruits,Confitures de fruits rouges,Confitures de fraises en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:breakfasts,en:spreads,en:fruits-based-foods,en:plant-based-spreads,en:sweet-spreads,en:fruit-preserves,en:jams,en:fruit-jams,en:berry-jams,en:strawberry-jams Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Breakfasts,Spreads,Fruits based foods,Plant-based spreads,Sweet spreads,Fruit preserves,Jams,Fruit jams,Berry jams,Strawberry jams France france EMB 78015 emb-78015 48.983333,2.066667 NA andresy-yvelines-france Lyon,France Casino France en:france France Sucre de canne, fraises 40 g, fraises des bois 14 g, gélifiant : pectines de fruits, jus de citron concentré. Préparée avec 54 g de fruits pour 100 g de produit fini. NA Lait,Fruits à coque en:milk,en:nuts Milk,Nuts 15 g NA 1 [ sucre-de-canne -> fr:sucre-de-canne ] [ sucre-de -> fr:sucre-de ] [ sucre -> fr:sucre ] [ fraises-40-g -> fr:fraises-40-g ] [ fraises-40 -> fr:fraises-40 ] [ fraises -> fr:fraises ] [ fraises-des-bois-14-g -> fr:fraises-des-bois-14-g ] [ fraises-des-bois-14 -> fr:fraises-des-bois-14 ] [ fraises-des-bois -> fr:fraises-des-bois ] [ fraises-des -> fr:fraises-des ] [ fraises -> fr:fraises ] [ pectines-de-fruits -> fr:pectines-de-fruits ] [ pectines-de -> fr:pectines-de ] [ pectines -> en:e440 -> exists ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit-fini -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit-fini ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100 -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100 ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits ] [ jus-de-citron-concentre-preparee-avec-54-g-de -> fr:jus-de-citron-concentre-preparee-avec-54-g-de ] [ jus-de-citron-concentre-preparee-avec-54-g -> fr:jus-de-citron-concentre-preparee-avec-54-g ] [ jus-de-citron-concentre-preparee-avec-54 -> fr:jus-de-citron-concentre-preparee-avec-54 ] [ jus-de-citron-concentre-preparee-avec -> fr:jus-de-citron-concentre-preparee-avec ] [ jus-de-citron-concentre-preparee -> fr:jus-de-citron-concentre-preparee ] [ jus-de-citron-concentre -> fr:jus-de-citron-concentre ] [ jus-de-citron -> fr:jus-de-citron ] [ jus-de -> fr:jus-de ] [ jus -> fr:jus ] en:e440 E440 - Pectins 0 NA 0 NA NA d Sugary snacks Sweets en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characteristics completed,Photos validated,Photos uploaded en:plant-based-foods-and-beverages Plant-based foods and beverages http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.400.jpg http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.200.jpg 918 NA 0.0 0.0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 54.0 54.0 NA NA NA NA NA NA NA NA NA 0.0 NA NA NA 0.0000 0.00 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 54 NA NA NA NA 11 11
2 100050 http://world-en.openfoodfacts.org/product/5410976880110/guylian-sea-shells-selection foodorigins 1450316429 2015-12-17T01:40:29Z 1450817956 2015-12-22T20:59:16Z Guylian Sea Shells Selection 375g Plastic,Box plastic,box Guylian guylian Chocolate en:sugary-snacks,en:chocolates Sugary snacks,Chocolates Belgium belgium NA NSW,Australia Australia en:australia Australia NA NA NA NA NA NA NA NA Sugary snacks Chocolate products en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Characteristics completed,Photos validated,Photos uploaded en:sugary-snacks Sugary snacks http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.400.jpg http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.200.jpg NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 100079 http://world-en.openfoodfacts.org/product/3264750423503/pates-de-fruits-aromatisees-jacquot domdom26 1428674916 2015-04-10T14:08:36Z 1428739289 2015-04-11T08:01:29Z Pâtes de fruits aromatisées Pâtes de fruits 1 kg Carton,plastique carton,plastique Jacquot jacquot pâtes de fruits en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:sugary-snacks,en:confectioneries,en:fruits-based-foods,en:fruit-pastes Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Sugary snacks,Confectioneries,Fruits based foods,Fruit pastes NA France France en:france France Pulpe de pommes 50% , sucre, sirop de glucose, gélifiant : pectine, acidifiant : acide citrique, arômes, colorants naturels : extrait de paprika — complexes cuivre—chlorophyllines — curcumine — antnocyanes NA NA 2 [ pulpe-de-pommes-50 -> fr:pulpe-de-pommes-50 ] [ pulpe-de-pommes -> fr:pulpe-de-pommes ] [ pulpe-de -> fr:pulpe-de ] [ pulpe -> fr:pulpe ] [ sucre -> fr:sucre ] [ sirop-de-glucose -> fr:sirop-de-glucose ] [ sirop-de -> fr:sirop-de ] [ sirop -> fr:sirop ] [ pectine -> en:e440 -> exists ] [ acide-citrique -> en:e330 -> exists ] [ aromes -> fr:aromes ] [ naturels -> fr:naturels ] [ extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine-antnocyanes -> fr:extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine-antnocyanes ] [ extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine -> fr:extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine ] [ extrait-de-paprika-complexes-cuivre-chlorophyllines -> fr:extrait-de-paprika-complexes-cuivre-chlorophyllines ] [ extrait-de-paprika-complexes-cuivre -> fr:extrait-de-paprika-complexes-cuivre ] [ extrait-de-paprika-complexes -> fr:extrait-de-paprika-complexes ] [ extrait-de-paprika -> fr:extrait-de-paprika ] [ extrait-de -> fr:extrait-de ] [ extrait -> fr:extrait ] en:e440,en:e330 E440 - Pectins,E330 - Citric acid 0 NA 0 NA NA Fruits and vegetables Fruits en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characteristics completed,Photos validated,Photos uploaded en:plant-based-foods-and-beverages Plant-based foods and beverages http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.400.jpg http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.200.jpg NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4 100094 http://world-en.openfoodfacts.org/product/8006040247001/nata-vegetal-a-base-de-soja-valsoia javichu 1420416591 2015-01-05T00:09:51Z 1420417876 2015-01-05T00:31:16Z Nata vegetal a base de soja &quot;Valsoia&quot; Nata vegetal a base de soja 200 ml Tetra Brik tetra-brik Valsoia,//Propiedad de://,Valsoia S.p.A. valsoia,propiedad-de,valsoia-s-p-a Alimentos y bebidas de origen vegetal,Alimentos de origen vegetal,Natas vegetales,Natas vegetales a base de soja para cocinar,Natas vegetales para cocinar en:plant-based-foods-and-beverages,en:plant-based-foods,en:plant-based-creams,en:plant-based-creams-for-cooking,en:soy-based-creams-for-cooking Plant-based foods and beverages,Plant-based foods,Plant-based creams,Plant-based creams for cooking,Soy-based creams for cooking Italia italia Vegetariano,Vegano,Sin gluten,Sin OMG,Sin lactosa en:vegetarian,en:vegan,en:gluten-free,en:no-gmos,en:no-lactose Vegetarian,Vegan,Gluten-free,No GMOs,No lactose NA Madrid,España El Corte Inglés España en:spain Spain Extracto de soja (78%) (agua, semillas de soja 8,3%), grasas vegetales, jarabe de glucosa, dextrosa, emulsionante: mono- y diglicéridos de ácidos grasos (E-471), sal marina, estabilizantes: goma xantana (E-415), carragenatos (E-407), goma guar (E-412); aromas, antioxidante: extractos de tocoferoles (de soja) (E-306). (Nota: el envase en italiano del paquete -que puede verse en el enlace-, especifica que el producto es 100% vegetal. Por tanto los mono- y diglicéridos de ácidos grasos (E-471) son de origen no animal). NA NA 5 [ extracto-de-soja -> es:extracto-de-soja ] [ 78 -> es:78 ] [ agua -> es:agua ] [ semillas-de-soja-8 -> es:semillas-de-soja-8 ] [ 3 -> en:fd-c ] [ grasas-vegetales -> es:grasas-vegetales ] [ jarabe-de-glucosa -> es:jarabe-de-glucosa ] [ dextrosa -> es:dextrosa ] [ emulsionante -> es:emulsionante ] [ mono-y-digliceridos-de-acidos-grasos -> en:e471 -> exists ] [ e471 -> en:e471 ] [ sal-marina -> es:sal-marina ] [ estabilizantes -> es:estabilizantes ] [ goma-xantana -> en:e415 -> exists ] [ e415 -> en:e415 ] [ carragenatos -> en:e407 -> exists ] [ e407 -> en:e407 ] [ goma-guar -> en:e412 -> exists ] [ e412 -> en:e412 ] [ aromas -> es:aromas ] [ antioxidante -> es:antioxidante ] [ extractos-de-tocoferoles -> es:extractos-de-tocoferoles ] [ de-soja -> es:de-soja ] [ e306 -> en:e306 -> exists ] [ nota -> es:nota ] [ el-envase-en-italiano-del-paquete-que-puede-verse-en-el-enlace -> es:el-envase-en-italiano-del-paquete-que-puede-verse-en-el-enlace ] [ especifica-que-el-producto-es-100-vegetal-por-tanto-los-mono-y-digliceridos-de-acidos-grasos -> es:especifica-que-el-producto-es-100-vegetal-por-tanto-los-mono-y-digliceridos-de-acidos-grasos ] [ e471 -> en:e471 ] [ son-de-origen-no-animal -> es:son-de-origen-no-animal ] [ -> es: ] en:e471,en:e415,en:e407,en:e412,en:e306 E471 - Mono- and diglycerides of fatty acids,E415 - Xanthan gum,E407 - Carrageenan,E412 - Guar gum,E306 - Tocopherol-rich extract 0 NA 1 NA e471-mono-et-diglycerides-d-acides-gras-alimentaires NA d unknown unknown en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date completed,Characteristics completed,Photos validated,Photos uploaded en:plant-based-foods-and-beverages Plant-based foods and beverages http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.400.jpg http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.200.jpg 766 NA 16.7 9.9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2.9 3.9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2e-04 5.7 4.2 NA NA NA NA NA NA NA NA 0.2 2.9 NA NA NA 0.0508 0.02 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 11
5 100124 http://world-en.openfoodfacts.org/product/8480000340764/semillas-de-girasol-con-cascara-tostadas-aguasal-hacendado javichu 1420501121 2015-01-05T23:38:41Z 1445700917 2015-10-24T15:35:17Z Semillas de girasol con cáscara tostadas aguasal Semillas de girasol con cáscara tostadas aguasal 200 g Bolsa de plástico,Envasado en atmósfera protectora bolsa-de-plastico,envasado-en-atmosfera-protectora Hacendado,//Propiedad de://,Mercadona S.A. hacendado,propiedad-de,mercadona-s-a Semillas de girasol y derivados, Semillas, Semillas de girasol, Semillas de girasol con cáscara, Semillas de girasol tostadas, Semillas de girasol con cáscara tostadas, Semillas de girasol con cáscara tostadas aguasal en:plant-based-foods-and-beverages,en:plant-based-foods,en:seeds,en:sunflower-seeds-and-their-products,en:sunflower-seeds,en:roasted-sunflower-seeds,en:unshelled-sunflower-seeds,en:roasted-unshelled-sunflower-seeds,es:semillas-de-girasol-con-cascara-tostadas-aguasal Plant-based foods and beverages,Plant-based foods,Seeds,Sunflower seeds and their products,Sunflower seeds,Roasted sunflower seeds,Unshelled sunflower seeds,Roasted unshelled sunflower seeds,es:Semillas-de-girasol-con-cascara-tostadas-aguasal Argentina argentina Beniparrell,Valencia (provincia),Comunidad Valenciana,España beniparrell,valencia-provincia,comunidad-valenciana,espana Vegetariano,Vegano,Sin gluten en:vegetarian,en:vegan,en:gluten-free Vegetarian,Vegan,Gluten-free ES 21.016540/V EC,ENVASADOR:,IMPORTACO S.A. es-21-016540-v-ec,envasador,importaco-s-a NA Madrid,España Mercadona España en:spain Spain Pipas de girasol y sal. NA Frutos de cáscara,Cacahuetes en:nuts,en:peanuts Nuts,Peanuts NA 0 [ pipas-de-girasol-y-sal -> es:pipas-de-girasol-y-sal ] 0 NA 0 NA NA d unknown unknown en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date completed,Characteristics completed,Photos validated,Photos uploaded en:plant-based-foods-and-beverages Plant-based foods and beverages http://en.openfoodfacts.org/images/products/848/000/034/0764/front.6.400.jpg http://en.openfoodfacts.org/images/products/848/000/034/0764/front.6.200.jpg 2359 NA 45.5 5.2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 9.5 32.8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 17.3 2.7 NA NA NA NA NA NA NA NA 9.0 18.2 NA NA NA 3.9878 1.57 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1.155 0.0038 0.129 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 17 17
6 100136 http://world-en.openfoodfacts.org/product/0087703177727/soft-drink foodorigins 1437983923 2015-07-27T07:58:43Z 1445577476 2015-10-23T05:17:56Z Soft Drink South Korea south-korea South Korea south-korea NA Australia en:australia Australia NA NA NA NA NA NA NA NA unknown unknown en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-be-completed, en:characteristics-to-be-completed, en:categories-to-be-completed, en:brands-to-be-completed, en:packaging-to-be-completed, en:quantity-to-be-completed, en:photos-to-be-validated, en:photos-uploaded en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-completed,en:characteristics-to-be-completed,en:categories-to-be-completed,en:brands-to-be-completed,en:packaging-to-be-completed,en:quantity-to-be-completed,en:photos-to-be-validated,en:photos-uploaded To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Characteristics to be completed,Categories to be completed,Brands to be completed,Packaging to be completed,Quantity to be completed,Photos to be validated,Photos uploaded http://en.openfoodfacts.org/images/products/008/770/317/7727/front.8.400.jpg http://en.openfoodfacts.org/images/products/008/770/317/7727/front.8.200.jpg NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# View structure of food
str(food, give.attr = FALSE)
## 'data.frame':    1500 obs. of  160 variables:
##  $ V1                                        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ code                                      : int  100030 100050 100079 100094 100124 100136 100194 100221 100257 100258 ...
##  $ url                                       : chr  "http://world-en.openfoodfacts.org/product/3222475745867/confiture-de-fraise-fraise-des-bois-au-sucre-de-canne-casino-delices" "http://world-en.openfoodfacts.org/product/5410976880110/guylian-sea-shells-selection" "http://world-en.openfoodfacts.org/product/3264750423503/pates-de-fruits-aromatisees-jacquot" "http://world-en.openfoodfacts.org/product/8006040247001/nata-vegetal-a-base-de-soja-valsoia" ...
##  $ creator                                   : chr  "sebleouf" "foodorigins" "domdom26" "javichu" ...
##  $ created_t                                 : int  1424747544 1450316429 1428674916 1420416591 1420501121 1437983923 1442420988 1435686217 1436991777 1400516512 ...
##  $ created_datetime                          : chr  "2015-02-24T03:12:24Z" "2015-12-17T01:40:29Z" "2015-04-10T14:08:36Z" "2015-01-05T00:09:51Z" ...
##  $ last_modified_t                           : int  1438445887 1450817956 1428739289 1420417876 1445700917 1445577476 1442420988 1451405288 1436991779 1437236856 ...
##  $ last_modified_datetime                    : chr  "2015-08-01T16:18:07Z" "2015-12-22T20:59:16Z" "2015-04-11T08:01:29Z" "2015-01-05T00:31:16Z" ...
##  $ product_name                              : chr  "Confiture de fraise fraise des bois au sucre de canne" "Guylian Sea Shells Selection" "Pâtes de fruits aromatisées" "Nata vegetal a base de soja &quot;Valsoia&quot;" ...
##  $ generic_name                              : chr  "" "" "Pâtes de fruits" "Nata vegetal a base de soja" ...
##  $ quantity                                  : chr  "265 g" "375g" "1 kg" "200 ml" ...
##  $ packaging                                 : chr  "Bocal,Verre" "Plastic,Box" "Carton,plastique" "Tetra Brik" ...
##  $ packaging_tags                            : chr  "bocal,verre" "plastic,box" "carton,plastique" "tetra-brik" ...
##  $ brands                                    : chr  "Casino Délices" "Guylian" "Jacquot" "Valsoia,//Propiedad de://,Valsoia S.p.A." ...
##  $ brands_tags                               : chr  "casino-delices" "guylian" "jacquot" "valsoia,propiedad-de,valsoia-s-p-a" ...
##  $ categories                                : chr  "Aliments et boissons à base de végétaux,Aliments d'origine végétale,Aliments à base de fruits et de légu"| __truncated__ "Chocolate" "pâtes de fruits" "Alimentos y bebidas de origen vegetal,Alimentos de origen vegetal,Natas vegetales,Natas vegetales a base de soj"| __truncated__ ...
##  $ categories_tags                           : chr  "en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:breakfasts,en:s"| __truncated__ "en:sugary-snacks,en:chocolates" "en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:sugary-snacks,e"| __truncated__ "en:plant-based-foods-and-beverages,en:plant-based-foods,en:plant-based-creams,en:plant-based-creams-for-cooking"| __truncated__ ...
##  $ categories_en                             : chr  "Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Breakfasts,Spreads,Fruits b"| __truncated__ "Sugary snacks,Chocolates" "Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Sugary snacks,Confectioneri"| __truncated__ "Plant-based foods and beverages,Plant-based foods,Plant-based creams,Plant-based creams for cooking,Soy-based c"| __truncated__ ...
##  $ origins                                   : chr  "" "" "" "" ...
##  $ origins_tags                              : chr  "" "" "" "" ...
##  $ manufacturing_places                      : chr  "France" "Belgium" "" "Italia" ...
##  $ manufacturing_places_tags                 : chr  "france" "belgium" "" "italia" ...
##  $ labels                                    : chr  "" "" "" "Vegetariano,Vegano,Sin gluten,Sin OMG,Sin lactosa" ...
##  $ labels_tags                               : chr  "" "" "" "en:vegetarian,en:vegan,en:gluten-free,en:no-gmos,en:no-lactose" ...
##  $ labels_en                                 : chr  "" "" "" "Vegetarian,Vegan,Gluten-free,No GMOs,No lactose" ...
##  $ emb_codes                                 : chr  "EMB 78015" "" "" "" ...
##  $ emb_codes_tags                            : chr  "emb-78015" "" "" "" ...
##  $ first_packaging_code_geo                  : chr  "48.983333,2.066667" "" "" "" ...
##  $ cities                                    : logi  NA NA NA NA NA NA ...
##  $ cities_tags                               : chr  "andresy-yvelines-france" "" "" "" ...
##  $ purchase_places                           : chr  "Lyon,France" "NSW,Australia" "France" "Madrid,España" ...
##  $ stores                                    : chr  "Casino" "" "" "El Corte Inglés" ...
##  $ countries                                 : chr  "France" "Australia" "France" "España" ...
##  $ countries_tags                            : chr  "en:france" "en:australia" "en:france" "en:spain" ...
##  $ countries_en                              : chr  "France" "Australia" "France" "Spain" ...
##  $ ingredients_text                          : chr  "Sucre de canne, fraises 40 g, fraises des bois 14 g, gélifiant : pectines de fruits, jus de citron concentré."| __truncated__ "" "Pulpe de pommes 50% , sucre, sirop de glucose, gélifiant : pectine, acidifiant : acide citrique, arômes, colo"| __truncated__ "Extracto de soja (78%) (agua, semillas de soja 8,3%), grasas vegetales, jarabe de glucosa, dextrosa, emulsionan"| __truncated__ ...
##  $ allergens                                 : chr  "" "" "" "" ...
##  $ allergens_en                              : logi  NA NA NA NA NA NA ...
##  $ traces                                    : chr  "Lait,Fruits à coque" "" "" "" ...
##  $ traces_tags                               : chr  "en:milk,en:nuts" "" "" "" ...
##  $ traces_en                                 : chr  "Milk,Nuts" "" "" "" ...
##  $ serving_size                              : chr  "15 g" "" "" "" ...
##  $ no_nutriments                             : logi  NA NA NA NA NA NA ...
##  $ additives_n                               : int  1 NA 2 5 0 NA NA 0 NA 1 ...
##  $ additives                                 : chr  "[ sucre-de-canne -> fr:sucre-de-canne  ]  [ sucre-de -> fr:sucre-de  ]  [ sucre -> fr:sucre  ]  [ fraises-40-g "| __truncated__ "" "[ pulpe-de-pommes-50 -> fr:pulpe-de-pommes-50  ]  [ pulpe-de-pommes -> fr:pulpe-de-pommes  ]  [ pulpe-de -> fr:"| __truncated__ "[ extracto-de-soja -> es:extracto-de-soja  ]  [ 78 -> es:78  ]  [ agua -> es:agua  ]  [ semillas-de-soja-8 -> e"| __truncated__ ...
##  $ additives_tags                            : chr  "en:e440" "" "en:e440,en:e330" "en:e471,en:e415,en:e407,en:e412,en:e306" ...
##  $ additives_en                              : chr  "E440 - Pectins" "" "E440 - Pectins,E330 - Citric acid" "E471 - Mono- and diglycerides of fatty acids,E415 - Xanthan gum,E407 - Carrageenan,E412 - Guar gum,E306 - Tocop"| __truncated__ ...
##  $ ingredients_from_palm_oil_n               : int  0 NA 0 0 0 NA NA 0 NA 0 ...
##  $ ingredients_from_palm_oil                 : logi  NA NA NA NA NA NA ...
##  $ ingredients_from_palm_oil_tags            : chr  "" "" "" "" ...
##  $ ingredients_that_may_be_from_palm_oil_n   : int  0 NA 0 1 0 NA NA 0 NA 0 ...
##  $ ingredients_that_may_be_from_palm_oil     : logi  NA NA NA NA NA NA ...
##  $ ingredients_that_may_be_from_palm_oil_tags: chr  "" "" "" "e471-mono-et-diglycerides-d-acides-gras-alimentaires" ...
##  $ nutrition_grade_uk                        : logi  NA NA NA NA NA NA ...
##  $ nutrition_grade_fr                        : chr  "d" "" "" "d" ...
##  $ pnns_groups_1                             : chr  "Sugary snacks" "Sugary snacks" "Fruits and vegetables" "unknown" ...
##  $ pnns_groups_2                             : chr  "Sweets" "Chocolate products" "Fruits" "unknown" ...
##  $ states                                    : chr  "en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be"| __truncated__ "en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-b"| __truncated__ "en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be"| __truncated__ "en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-compl"| __truncated__ ...
##  $ states_tags                               : chr  "en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-com"| __truncated__ "en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-c"| __truncated__ "en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-com"| __truncated__ "en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed"| __truncated__ ...
##  $ states_en                                 : chr  "To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characte"| __truncated__ "To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Cha"| __truncated__ "To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characte"| __truncated__ "To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date completed,Characteristic"| __truncated__ ...
##  $ main_category                             : chr  "en:plant-based-foods-and-beverages" "en:sugary-snacks" "en:plant-based-foods-and-beverages" "en:plant-based-foods-and-beverages" ...
##  $ main_category_en                          : chr  "Plant-based foods and beverages" "Sugary snacks" "Plant-based foods and beverages" "Plant-based foods and beverages" ...
##  $ image_url                                 : chr  "http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.400.jpg" "http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.400.jpg" "http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.400.jpg" "http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.400.jpg" ...
##  $ image_small_url                           : chr  "http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.200.jpg" "http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.200.jpg" "http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.200.jpg" "http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.200.jpg" ...
##  $ energy_100g                               : num  918 NA NA 766 2359 ...
##  $ energy_from_fat_100g                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ fat_100g                                  : num  0 NA NA 16.7 45.5 NA NA 25 NA 4 ...
##  $ saturated_fat_100g                        : num  0 NA NA 9.9 5.2 NA NA 17 NA 0.54 ...
##  $ butyric_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ caproic_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ caprylic_acid_100g                        : logi  NA NA NA NA NA NA ...
##  $ capric_acid_100g                          : logi  NA NA NA NA NA NA ...
##  $ lauric_acid_100g                          : logi  NA NA NA NA NA NA ...
##  $ myristic_acid_100g                        : logi  NA NA NA NA NA NA ...
##  $ palmitic_acid_100g                        : logi  NA NA NA NA NA NA ...
##  $ stearic_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ arachidic_acid_100g                       : logi  NA NA NA NA NA NA ...
##  $ behenic_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ lignoceric_acid_100g                      : logi  NA NA NA NA NA NA ...
##  $ cerotic_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ montanic_acid_100g                        : logi  NA NA NA NA NA NA ...
##  $ melissic_acid_100g                        : logi  NA NA NA NA NA NA ...
##  $ monounsaturated_fat_100g                  : num  NA NA NA 2.9 9.5 NA NA NA NA NA ...
##  $ polyunsaturated_fat_100g                  : num  NA NA NA 3.9 32.8 NA NA NA NA NA ...
##  $ omega_3_fat_100g                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ alpha_linolenic_acid_100g                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ eicosapentaenoic_acid_100g                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ docosahexaenoic_acid_100g                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ omega_6_fat_100g                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ linoleic_acid_100g                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ arachidonic_acid_100g                     : logi  NA NA NA NA NA NA ...
##  $ gamma_linolenic_acid_100g                 : logi  NA NA NA NA NA NA ...
##  $ dihomo_gamma_linolenic_acid_100g          : logi  NA NA NA NA NA NA ...
##  $ omega_9_fat_100g                          : logi  NA NA NA NA NA NA ...
##  $ oleic_acid_100g                           : logi  NA NA NA NA NA NA ...
##  $ elaidic_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ gondoic_acid_100g                         : logi  NA NA NA NA NA NA ...
##  $ mead_acid_100g                            : logi  NA NA NA NA NA NA ...
##  $ erucic_acid_100g                          : logi  NA NA NA NA NA NA ...
##   [list output truncated]

Information overload. With datasets this big, it’s hard to get a handle on exactly what they contain.

Inspecting variables

The str(), head(), and summary() functions are designed to give you some information about a dataset without being overwhelming. However, this dataset is so large and has so many variables that even these outputs seemed pretty intimidating!

The glimpse() function from the dplyr package often formats information in a more approachable way.

Yet another option is to just look at the column names to see what kinds of data you have. As you look at the names, pay particular attention to any pairs that look like duplicates.

# Load dplyr
library(dplyr)

# View a glimpse of food
glimpse(food)
## Observations: 1,500
## Variables: 160
## $ V1                                         <int> 1, 2, 3, 4, 5, 6, 7...
## $ code                                       <int> 100030, 100050, 100...
## $ url                                        <chr> "http://world-en.op...
## $ creator                                    <chr> "sebleouf", "foodor...
## $ created_t                                  <int> 1424747544, 1450316...
## $ created_datetime                           <chr> "2015-02-24T03:12:2...
## $ last_modified_t                            <int> 1438445887, 1450817...
## $ last_modified_datetime                     <chr> "2015-08-01T16:18:0...
## $ product_name                               <chr> "Confiture de frais...
## $ generic_name                               <chr> "", "", "Pâtes de ...
## $ quantity                                   <chr> "265 g", "375g", "1...
## $ packaging                                  <chr> "Bocal,Verre", "Pla...
## $ packaging_tags                             <chr> "bocal,verre", "pla...
## $ brands                                     <chr> "Casino Délices", ...
## $ brands_tags                                <chr> "casino-delices", "...
## $ categories                                 <chr> "Aliments et boisso...
## $ categories_tags                            <chr> "en:plant-based-foo...
## $ categories_en                              <chr> "Plant-based foods ...
## $ origins                                    <chr> "", "", "", "", "Ar...
## $ origins_tags                               <chr> "", "", "", "", "ar...
## $ manufacturing_places                       <chr> "France", "Belgium"...
## $ manufacturing_places_tags                  <chr> "france", "belgium"...
## $ labels                                     <chr> "", "", "", "Vegeta...
## $ labels_tags                                <chr> "", "", "", "en:veg...
## $ labels_en                                  <chr> "", "", "", "Vegeta...
## $ emb_codes                                  <chr> "EMB 78015", "", ""...
## $ emb_codes_tags                             <chr> "emb-78015", "", ""...
## $ first_packaging_code_geo                   <chr> "48.983333,2.066667...
## $ cities                                     <lgl> NA, NA, NA, NA, NA,...
## $ cities_tags                                <chr> "andresy-yvelines-f...
## $ purchase_places                            <chr> "Lyon,France", "NSW...
## $ stores                                     <chr> "Casino", "", "", "...
## $ countries                                  <chr> "France", "Australi...
## $ countries_tags                             <chr> "en:france", "en:au...
## $ countries_en                               <chr> "France", "Australi...
## $ ingredients_text                           <chr> "Sucre de canne, fr...
## $ allergens                                  <chr> "", "", "", "", "",...
## $ allergens_en                               <lgl> NA, NA, NA, NA, NA,...
## $ traces                                     <chr> "Lait,Fruits à coq...
## $ traces_tags                                <chr> "en:milk,en:nuts", ...
## $ traces_en                                  <chr> "Milk,Nuts", "", ""...
## $ serving_size                               <chr> "15 g", "", "", "",...
## $ no_nutriments                              <lgl> NA, NA, NA, NA, NA,...
## $ additives_n                                <int> 1, NA, 2, 5, 0, NA,...
## $ additives                                  <chr> "[ sucre-de-canne -...
## $ additives_tags                             <chr> "en:e440", "", "en:...
## $ additives_en                               <chr> "E440 - Pectins", "...
## $ ingredients_from_palm_oil_n                <int> 0, NA, 0, 0, 0, NA,...
## $ ingredients_from_palm_oil                  <lgl> NA, NA, NA, NA, NA,...
## $ ingredients_from_palm_oil_tags             <chr> "", "", "", "", "",...
## $ ingredients_that_may_be_from_palm_oil_n    <int> 0, NA, 0, 1, 0, NA,...
## $ ingredients_that_may_be_from_palm_oil      <lgl> NA, NA, NA, NA, NA,...
## $ ingredients_that_may_be_from_palm_oil_tags <chr> "", "", "", "e471-m...
## $ nutrition_grade_uk                         <lgl> NA, NA, NA, NA, NA,...
## $ nutrition_grade_fr                         <chr> "d", "", "", "d", "...
## $ pnns_groups_1                              <chr> "Sugary snacks", "S...
## $ pnns_groups_2                              <chr> "Sweets", "Chocolat...
## $ states                                     <chr> "en:to-be-checked, ...
## $ states_tags                                <chr> "en:to-be-checked,e...
## $ states_en                                  <chr> "To be checked,Comp...
## $ main_category                              <chr> "en:plant-based-foo...
## $ main_category_en                           <chr> "Plant-based foods ...
## $ image_url                                  <chr> "http://en.openfood...
## $ image_small_url                            <chr> "http://en.openfood...
## $ energy_100g                                <dbl> 918, NA, NA, 766, 2...
## $ energy_from_fat_100g                       <dbl> NA, NA, NA, NA, NA,...
## $ fat_100g                                   <dbl> 0.00, NA, NA, 16.70...
## $ saturated_fat_100g                         <dbl> 0.000, NA, NA, 9.90...
## $ butyric_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ caproic_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ caprylic_acid_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ capric_acid_100g                           <lgl> NA, NA, NA, NA, NA,...
## $ lauric_acid_100g                           <lgl> NA, NA, NA, NA, NA,...
## $ myristic_acid_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ palmitic_acid_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ stearic_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ arachidic_acid_100g                        <lgl> NA, NA, NA, NA, NA,...
## $ behenic_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ lignoceric_acid_100g                       <lgl> NA, NA, NA, NA, NA,...
## $ cerotic_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ montanic_acid_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ melissic_acid_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ monounsaturated_fat_100g                   <dbl> NA, NA, NA, 2.9, 9....
## $ polyunsaturated_fat_100g                   <dbl> NA, NA, NA, 3.9, 32...
## $ omega_3_fat_100g                           <dbl> NA, NA, NA, NA, NA,...
## $ alpha_linolenic_acid_100g                  <dbl> NA, NA, NA, NA, NA,...
## $ eicosapentaenoic_acid_100g                 <dbl> NA, NA, NA, NA, NA,...
## $ docosahexaenoic_acid_100g                  <dbl> NA, NA, NA, NA, NA,...
## $ omega_6_fat_100g                           <dbl> NA, NA, NA, NA, NA,...
## $ linoleic_acid_100g                         <dbl> NA, NA, NA, NA, NA,...
## $ arachidonic_acid_100g                      <lgl> NA, NA, NA, NA, NA,...
## $ gamma_linolenic_acid_100g                  <lgl> NA, NA, NA, NA, NA,...
## $ dihomo_gamma_linolenic_acid_100g           <lgl> NA, NA, NA, NA, NA,...
## $ omega_9_fat_100g                           <lgl> NA, NA, NA, NA, NA,...
## $ oleic_acid_100g                            <lgl> NA, NA, NA, NA, NA,...
## $ elaidic_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ gondoic_acid_100g                          <lgl> NA, NA, NA, NA, NA,...
## $ mead_acid_100g                             <lgl> NA, NA, NA, NA, NA,...
## $ erucic_acid_100g                           <lgl> NA, NA, NA, NA, NA,...
## $ nervonic_acid_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ trans_fat_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ cholesterol_100g                           <dbl> NA, NA, NA, 0.00020...
## $ carbohydrates_100g                         <dbl> 54.00, NA, NA, 5.70...
## $ sugars_100g                                <dbl> 54.00, NA, NA, 4.20...
## $ sucrose_100g                               <lgl> NA, NA, NA, NA, NA,...
## $ glucose_100g                               <lgl> NA, NA, NA, NA, NA,...
## $ fructose_100g                              <int> NA, NA, NA, NA, NA,...
## $ lactose_100g                               <dbl> NA, NA, NA, NA, NA,...
## $ maltose_100g                               <lgl> NA, NA, NA, NA, NA,...
## $ maltodextrins_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ starch_100g                                <dbl> NA, NA, NA, NA, NA,...
## $ polyols_100g                               <dbl> NA, NA, NA, NA, NA,...
## $ fiber_100g                                 <dbl> NA, NA, NA, 0.2, 9....
## $ proteins_100g                              <dbl> 0.00, NA, NA, 2.90,...
## $ casein_100g                                <dbl> NA, NA, NA, NA, NA,...
## $ serum_proteins_100g                        <lgl> NA, NA, NA, NA, NA,...
## $ nucleotides_100g                           <lgl> NA, NA, NA, NA, NA,...
## $ salt_100g                                  <dbl> 0.0000000, NA, NA, ...
## $ sodium_100g                                <dbl> 0.0000000, NA, NA, ...
## $ alcohol_100g                               <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_a_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ beta_carotene_100g                         <lgl> NA, NA, NA, NA, NA,...
## $ vitamin_d_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_e_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_k_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_c_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b1_100g                            <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b2_100g                            <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_pp_100g                            <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b6_100g                            <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b9_100g                            <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b12_100g                           <dbl> NA, NA, NA, NA, NA,...
## $ biotin_100g                                <dbl> NA, NA, NA, NA, NA,...
## $ pantothenic_acid_100g                      <dbl> NA, NA, NA, NA, NA,...
## $ silica_100g                                <dbl> NA, NA, NA, NA, NA,...
## $ bicarbonate_100g                           <dbl> NA, NA, NA, NA, NA,...
## $ potassium_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ chloride_100g                              <dbl> NA, NA, NA, NA, NA,...
## $ calcium_100g                               <dbl> NA, NA, NA, NA, NA,...
## $ phosphorus_100g                            <dbl> NA, NA, NA, NA, 1.1...
## $ iron_100g                                  <dbl> NA, NA, NA, NA, 0.0...
## $ magnesium_100g                             <dbl> NA, NA, NA, NA, 0.1...
## $ zinc_100g                                  <dbl> NA, NA, NA, NA, NA,...
## $ copper_100g                                <dbl> NA, NA, NA, NA, NA,...
## $ manganese_100g                             <dbl> NA, NA, NA, NA, NA,...
## $ fluoride_100g                              <dbl> NA, NA, NA, NA, NA,...
## $ selenium_100g                              <dbl> NA, NA, NA, NA, NA,...
## $ chromium_100g                              <lgl> NA, NA, NA, NA, NA,...
## $ molybdenum_100g                            <lgl> NA, NA, NA, NA, NA,...
## $ iodine_100g                                <dbl> NA, NA, NA, NA, NA,...
## $ caffeine_100g                              <lgl> NA, NA, NA, NA, NA,...
## $ taurine_100g                               <lgl> NA, NA, NA, NA, NA,...
## $ ph_100g                                    <lgl> NA, NA, NA, NA, NA,...
## $ fruits_vegetables_nuts_100g                <dbl> 54, NA, NA, NA, NA,...
## $ collagen_meat_protein_ratio_100g           <int> NA, NA, NA, NA, NA,...
## $ cocoa_100g                                 <int> NA, NA, NA, NA, NA,...
## $ chlorophyl_100g                            <lgl> NA, NA, NA, NA, NA,...
## $ carbon_footprint_100g                      <dbl> NA, NA, NA, NA, NA,...
## $ nutrition_score_fr_100g                    <int> 11, NA, NA, 11, 17,...
## $ nutrition_score_uk_100g                    <int> 11, NA, NA, 11, 17,...
# View column names of food
names(food)
##   [1] "V1"                                        
##   [2] "code"                                      
##   [3] "url"                                       
##   [4] "creator"                                   
##   [5] "created_t"                                 
##   [6] "created_datetime"                          
##   [7] "last_modified_t"                           
##   [8] "last_modified_datetime"                    
##   [9] "product_name"                              
##  [10] "generic_name"                              
##  [11] "quantity"                                  
##  [12] "packaging"                                 
##  [13] "packaging_tags"                            
##  [14] "brands"                                    
##  [15] "brands_tags"                               
##  [16] "categories"                                
##  [17] "categories_tags"                           
##  [18] "categories_en"                             
##  [19] "origins"                                   
##  [20] "origins_tags"                              
##  [21] "manufacturing_places"                      
##  [22] "manufacturing_places_tags"                 
##  [23] "labels"                                    
##  [24] "labels_tags"                               
##  [25] "labels_en"                                 
##  [26] "emb_codes"                                 
##  [27] "emb_codes_tags"                            
##  [28] "first_packaging_code_geo"                  
##  [29] "cities"                                    
##  [30] "cities_tags"                               
##  [31] "purchase_places"                           
##  [32] "stores"                                    
##  [33] "countries"                                 
##  [34] "countries_tags"                            
##  [35] "countries_en"                              
##  [36] "ingredients_text"                          
##  [37] "allergens"                                 
##  [38] "allergens_en"                              
##  [39] "traces"                                    
##  [40] "traces_tags"                               
##  [41] "traces_en"                                 
##  [42] "serving_size"                              
##  [43] "no_nutriments"                             
##  [44] "additives_n"                               
##  [45] "additives"                                 
##  [46] "additives_tags"                            
##  [47] "additives_en"                              
##  [48] "ingredients_from_palm_oil_n"               
##  [49] "ingredients_from_palm_oil"                 
##  [50] "ingredients_from_palm_oil_tags"            
##  [51] "ingredients_that_may_be_from_palm_oil_n"   
##  [52] "ingredients_that_may_be_from_palm_oil"     
##  [53] "ingredients_that_may_be_from_palm_oil_tags"
##  [54] "nutrition_grade_uk"                        
##  [55] "nutrition_grade_fr"                        
##  [56] "pnns_groups_1"                             
##  [57] "pnns_groups_2"                             
##  [58] "states"                                    
##  [59] "states_tags"                               
##  [60] "states_en"                                 
##  [61] "main_category"                             
##  [62] "main_category_en"                          
##  [63] "image_url"                                 
##  [64] "image_small_url"                           
##  [65] "energy_100g"                               
##  [66] "energy_from_fat_100g"                      
##  [67] "fat_100g"                                  
##  [68] "saturated_fat_100g"                        
##  [69] "butyric_acid_100g"                         
##  [70] "caproic_acid_100g"                         
##  [71] "caprylic_acid_100g"                        
##  [72] "capric_acid_100g"                          
##  [73] "lauric_acid_100g"                          
##  [74] "myristic_acid_100g"                        
##  [75] "palmitic_acid_100g"                        
##  [76] "stearic_acid_100g"                         
##  [77] "arachidic_acid_100g"                       
##  [78] "behenic_acid_100g"                         
##  [79] "lignoceric_acid_100g"                      
##  [80] "cerotic_acid_100g"                         
##  [81] "montanic_acid_100g"                        
##  [82] "melissic_acid_100g"                        
##  [83] "monounsaturated_fat_100g"                  
##  [84] "polyunsaturated_fat_100g"                  
##  [85] "omega_3_fat_100g"                          
##  [86] "alpha_linolenic_acid_100g"                 
##  [87] "eicosapentaenoic_acid_100g"                
##  [88] "docosahexaenoic_acid_100g"                 
##  [89] "omega_6_fat_100g"                          
##  [90] "linoleic_acid_100g"                        
##  [91] "arachidonic_acid_100g"                     
##  [92] "gamma_linolenic_acid_100g"                 
##  [93] "dihomo_gamma_linolenic_acid_100g"          
##  [94] "omega_9_fat_100g"                          
##  [95] "oleic_acid_100g"                           
##  [96] "elaidic_acid_100g"                         
##  [97] "gondoic_acid_100g"                         
##  [98] "mead_acid_100g"                            
##  [99] "erucic_acid_100g"                          
## [100] "nervonic_acid_100g"                        
## [101] "trans_fat_100g"                            
## [102] "cholesterol_100g"                          
## [103] "carbohydrates_100g"                        
## [104] "sugars_100g"                               
## [105] "sucrose_100g"                              
## [106] "glucose_100g"                              
## [107] "fructose_100g"                             
## [108] "lactose_100g"                              
## [109] "maltose_100g"                              
## [110] "maltodextrins_100g"                        
## [111] "starch_100g"                               
## [112] "polyols_100g"                              
## [113] "fiber_100g"                                
## [114] "proteins_100g"                             
## [115] "casein_100g"                               
## [116] "serum_proteins_100g"                       
## [117] "nucleotides_100g"                          
## [118] "salt_100g"                                 
## [119] "sodium_100g"                               
## [120] "alcohol_100g"                              
## [121] "vitamin_a_100g"                            
## [122] "beta_carotene_100g"                        
## [123] "vitamin_d_100g"                            
## [124] "vitamin_e_100g"                            
## [125] "vitamin_k_100g"                            
## [126] "vitamin_c_100g"                            
## [127] "vitamin_b1_100g"                           
## [128] "vitamin_b2_100g"                           
## [129] "vitamin_pp_100g"                           
## [130] "vitamin_b6_100g"                           
## [131] "vitamin_b9_100g"                           
## [132] "vitamin_b12_100g"                          
## [133] "biotin_100g"                               
## [134] "pantothenic_acid_100g"                     
## [135] "silica_100g"                               
## [136] "bicarbonate_100g"                          
## [137] "potassium_100g"                            
## [138] "chloride_100g"                             
## [139] "calcium_100g"                              
## [140] "phosphorus_100g"                           
## [141] "iron_100g"                                 
## [142] "magnesium_100g"                            
## [143] "zinc_100g"                                 
## [144] "copper_100g"                               
## [145] "manganese_100g"                            
## [146] "fluoride_100g"                             
## [147] "selenium_100g"                             
## [148] "chromium_100g"                             
## [149] "molybdenum_100g"                           
## [150] "iodine_100g"                               
## [151] "caffeine_100g"                             
## [152] "taurine_100g"                              
## [153] "ph_100g"                                   
## [154] "fruits_vegetables_nuts_100g"               
## [155] "collagen_meat_protein_ratio_100g"          
## [156] "cocoa_100g"                                
## [157] "chlorophyl_100g"                           
## [158] "carbon_footprint_100g"                     
## [159] "nutrition_score_fr_100g"                   
## [160] "nutrition_score_uk_100g"

This is a little more manageable. Before moving on, scroll through the column names and see if you can find pairs that might be duplicates.

Removing duplicate info

Wow! That’s a lot of variables. To summarize, there’s some information on what and when information was added (1:9), meta information about food (10:17, 22:27), where it came from (18:21, 28:34), what it’s made of (35:52), nutrition grades (53:54), some unclear (55:63), and some nutritional information (64:159).

There are also many different pairs of columns that contain duplicate information. Luckily, you have a trusty assistant who went through and identified duplicate columns for you.

A vector has been created for you that lists out all of the duplicates; all you need to do is remove those columns from the dataset. Don’t forget, you can use the - operator to specify columns to omit, e.g.:

my_df[, -3] # Omit third column

# Define vector of duplicate cols (don't change)
duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22, 
                24, 25, 28, 32, 34, 36, 38, 40, 
                44, 46, 48, 51, 54, 65, 158)

# Remove duplicates from food: food2
food2 <- food[,-duplicates]

Removing useless info

Your dataset is much more manageable already.

In addition to duplicate columns, there are many columns containing information that you just can’t use. For example, the first few columns contain internal codes that don’t have any meaning to us. There are also some column names that aren’t clear enough to tell what they contain.

All of these columns can be deleted. Once again, your assistant did a splendid job finding the indices for you.

# Define useless vector (don't change)
useless <- c(1, 2, 3, 32:41)

# Remove useless columns from food2: food3
food3 <- food2[, -useless]

Finding columns

Looking much nicer! Recall from the first exercise that you are assuming you will be analyzing the sugar content of these foods. Therefore, your next step is to look at a summary of the nutrition information.

All of the columns with nutrition info contain the character string “100g” as part of their name, which makes it easy to identify them.

library(stringr)

# Create vector of column indices: nutrition
nutrition <- str_detect(names(food3), "100g")

# View a summary of nutrition columns
sum_food3 <- as.data.frame(do.call(cbind, lapply(food3[,nutrition], summary)))
## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 4)
sum_food3 %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689")
energy_from_fat_100g fat_100g saturated_fat_100g butyric_acid_100g caproic_acid_100g caprylic_acid_100g capric_acid_100g lauric_acid_100g myristic_acid_100g palmitic_acid_100g stearic_acid_100g arachidic_acid_100g behenic_acid_100g lignoceric_acid_100g cerotic_acid_100g montanic_acid_100g melissic_acid_100g monounsaturated_fat_100g polyunsaturated_fat_100g omega_3_fat_100g alpha_linolenic_acid_100g eicosapentaenoic_acid_100g docosahexaenoic_acid_100g omega_6_fat_100g linoleic_acid_100g arachidonic_acid_100g gamma_linolenic_acid_100g dihomo_gamma_linolenic_acid_100g omega_9_fat_100g oleic_acid_100g elaidic_acid_100g gondoic_acid_100g mead_acid_100g erucic_acid_100g nervonic_acid_100g trans_fat_100g cholesterol_100g carbohydrates_100g sugars_100g sucrose_100g glucose_100g fructose_100g lactose_100g maltose_100g maltodextrins_100g starch_100g polyols_100g fiber_100g proteins_100g casein_100g serum_proteins_100g nucleotides_100g salt_100g sodium_100g alcohol_100g vitamin_a_100g beta_carotene_100g vitamin_d_100g vitamin_e_100g vitamin_k_100g vitamin_c_100g vitamin_b1_100g vitamin_b2_100g vitamin_pp_100g vitamin_b6_100g vitamin_b9_100g vitamin_b12_100g biotin_100g pantothenic_acid_100g silica_100g bicarbonate_100g potassium_100g chloride_100g calcium_100g phosphorus_100g iron_100g magnesium_100g zinc_100g copper_100g manganese_100g fluoride_100g selenium_100g chromium_100g molybdenum_100g iodine_100g caffeine_100g taurine_100g ph_100g fruits_vegetables_nuts_100g collagen_meat_protein_ratio_100g cocoa_100g chlorophyl_100g nutrition_score_fr_100g nutrition_score_uk_100g
Min. 0 0 0 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 0 0.4 0.033 0.08 0.721 1.09 0.25 0.5 logical logical logical logical logical logical logical logical logical logical 0 0 0 0 logical logical 100 0 logical logical 0 8.6 0 0 1.1 logical logical 0 0 0 0 logical 7.5e-07 5e-04 5.3e-06 0 6e-05 0.000176 0.00059 6.6e-05 1.13e-05 2e-07 1.9e-06 9e-07 0.00082 0.00063 4e-05 3e-04 0 0.043 0 5e-05 5e-04 3.6e-05 6.5e-06 2.7e-06 1.44e-06 logical logical 1e-05 logical logical logical 2 12 30 logical -12 -12
1st Qu. 35.975 0.9 0.2 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 3.87 1.6525 1.3 0.0905 0.721 1.09 0.25 0.5165 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0 0 3.7925 1 1500 1500 100 0.25 1500 1500 9.45 59.1 0.5 1.5 1.1 1500 1500 0.04375 0.0172244094488189 0 0 1500 9.5e-07 0.002125 6.85e-06 0.002 0.0002925 0.00026 0.003325 0.00023 5e-05 4e-07 3.3e-06 0.000685 0.00082 0.067815 0.065 6e-04 0.045 0.19375 0.0012 0.067 9e-04 6.025e-05 6.5e-06 4.525e-06 1.44e-06 1500 1500 1e-05 1500 1500 1500 11.25 13.5 47 1500 1 0
Median 237 6 1.7 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 9.5 3.9 3 0.101 0.721 1.09 0.25 0.533 logical logical logical logical logical logical logical logical logical logical 0 0 13.5 4.05 logical logical 100 0.5 logical logical 39.5 67 1.75 6 1.1 logical logical 0.44979 0.177082677165355 5.5 7e-05 logical 3e-06 0.0044 8.4e-06 0.019 0.00045 0.00093 0.0069 8e-04 7.3e-05 2e-06 4.7e-06 0.00195 0.00082 0.135 0.194 9e-04 0.12 0.3185 0.0042 0.104 0.00167 8.45e-05 6.5e-06 6.35e-06 1.44e-06 logical logical 1e-05 logical logical logical 42 15 60 logical 7 6
Mean 668.407142857143 13.3945006313131 4.87399004267425 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 19.7731428571429 9.98555555555556 3.72588888888889 0.173666666666667 0.721 1.09 0.25 0.533 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0.0105263157894737 0.0264565217391304 27.9578118686869 12.6564831460674 1500 1500 100 2.93333333333333 1500 1500 30.7285714285714 56.0555555555556 2.82298913043478 7.56324050632911 1.1 1500 1500 1.12053058111111 0.440933823928259 10.0671641791045 0.000303926086956522 1500 1.29393333333333e-05 0.00689818181818182 8.4e-06 0.024971487804878 0.000605 0.00111858823529412 0.008555625 0.0112242105263158 0.000110858823529412 1.42272727272727e-06 4.7e-06 0.00267827857142857 0.00082 0.16921 0.328764615384615 0.0144 0.203958235294118 0.377666666666667 0.00454708108108108 0.106559523809524 0.00158142857142857 8.45e-05 6.5e-06 6.35e-06 1.44e-06 1500 1500 1e-05 1500 1500 1500 36.885 15.6666666666667 57 1500 7.94074074074074 7.63111111111111
3rd Qu. 974 20 6.5 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 29 12.7 3.2 0.2205 0.721 1.09 0.25 0.5495 logical logical logical logical logical logical logical logical logical logical 0 0.002625 55 14.7 logical logical 100 4.4 logical logical 42.85 69.8 3.5 10.675 1.1 logical logical 1.1938 0.47 13 0.0005975 logical 5.5e-06 0.0097 9.95e-06 0.03 0.0009625 0.00127 0.01405 0.001235 0.00017 2.245e-06 6.1e-06 0.005075 0.00082 0.2535 0.367 0.02145 0.1985 0.434 0.00771 0.13 0.00225 0.00010875 6.5e-06 8.175e-06 1.44e-06 logical logical 1e-05 logical logical logical 52.25 17.5 70 logical 15 16
Max. 2900 100 57 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 75 46.2 12.4 0.34 0.721 1.09 0.25 0.566 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 0.1 0.43 100 100 1500 1500 100 8.3 1500 1500 71 70 46.7 61 1.1 1500 1500 102 40 50 0.001346 1500 1e-04 0.032 1.15e-05 0.217 0.0013 0.0066 0.016 0.2 0.000237 2.5e-06 7.5e-06 0.006 0.00082 0.372 1.43 0.042 1 1.155 0.0137 0.333 0.0026 0.000133 6.5e-06 1e-05 1.44e-06 1500 1500 1e-05 1500 1500 1500 80 20 81 1500 28 28
NA’s 1486 708 797 logical logical logical logical logical logical logical logical logical logical logical logical logical logical 1465 1464 1491 1497 1499 1499 1499 1498 logical logical logical logical logical logical logical logical logical logical 1481 1477 708 788 logical logical 1499 1497 logical logical 1493 1491 994 710 1499 logical logical 780 780 1433 1477 logical 1485 1478 1498 1459 1478 1483 1484 1481 1483 1489 1498 1486 1499 1497 1487 1497 1449 1488 1463 1479 1493 1498 1499 1498 1499 logical logical 1499 logical logical logical 1470 1497 1491 logical 825 825

Take a look at the results before moving on. Anything noteworthy about the nutrition data

Replacing missing values

Unfortunately, the summary revealed that the nutrition data are mostly NA values. After consulting with the lab technician, it appears that much of the data is missing because the food just doesn’t have those nutrients.

But all is not lost! The lab tech also said that for sugar content, zero values are sometimes entered explicitly, but sometimes the values are just left empty to denote a zero. A statistical miracle!

In this exercise, you’ll replace all NA values with zeroes in the sugars_100g column and make histograms to visualize the result. Then, you will exclude the observations which have no sugar to see how the distribution changes.

# Find indices of sugar NA values: missing
missing <- is.na(food3$sugars_100g)

# Replace NA values with 0
food3$sugars_100g[missing] <- 0

# Create first histogram
hist(food3$sugars_100g, breaks = 100)

# Create food4
food4 <- food3[food3$sugars_100g > 0, ]

# Create second histogram
hist(food4$sugars_100g, breaks = 100)

Excluding the observations which don’t contain any sugar, you can better visualize what the underlying distribution looks like. And now, for something completely different.

Dealing with messy data

Your analysis of sugar content was so impressive that you’ve now been tasked with determining how many of these foods come in some sort of plastic packaging. (No good deed goes unpunished, as they say.)

Your dataset has information about packaging, but there’s a bit of a problem: it’s stored in several different languages (Spanish, French, and English). This takes messy data to a whole new level! There is no R package to selectively translate, but what if you could just work with the messy data directly?

You’re in luck! The root word for plastic is same in English (plastic), French (plastique), and Spanish (plastico). To get a general idea of how many of these foods are packaged in plastic, you can look through the packaging column for the string “plasti”.

# Find entries containing "plasti": plastic
plastic <- str_detect(food3$packaging, "plasti")

# Print the sum of plastic
sum(plastic)
## [1] 232